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^ (57) Abstract: The foregoing problems are solved and a technical advance is achieved by a computer-implemented system for 
^ providing a digital watermark in an audio signal. In a preferred embodiment, a audio file (108), such as a .WAV file, representing an 
^ audio signal to be watermaiked is processed using an algorithm of the present invention herein referred to as the "PAWS algorithm" 

(104) to determine and log the location and number of opportunities that exist for inserting a watermark into the audio signal such 
O that it will be masked by die audio signal. The usa- can adjust (17) certain parameters (1 12) of the PAWS algorithm (104) before 

the audio file is processed. A/B/X testing between the original and watermaiked files is also supporter to allow the user to undo or 
^ re-encode the watermark, if desired. 



wo 01/29691 



PCT/USOO/28166 



SYSTEM FOR PROVIDING A DIGITAL WATERMARK IN AN AUDIO SIGNAL 
Bacfafround of the Invention 

This invention relates generally to signal processing systems, and more particularly 
to a signal processing system for providing a digital watermark in an audio signal. 

This application is related to U.S. Patent No. 5.404,377 to Donald W. Moses and U.S. 
Patent No. 5.612,943 to Robert W. Moses et al., both of which are hereby incorporated by 
reference in their entireties. 

With the advent of computer networks and digital multimedia, protection of 
intellectual property has become a prime concern for creators and publishers of digitized 
copies of copyrightable works, such as musical recordings, movies, and video games. Once 
method of protecting copyrights in the digital domain is to use digital "watermarks." 
Digital watermarks can be used to mark each individual copy of a digitized work with 
information identifying, inter alia, the title, copyright holder, and even the licensed owner of 
a particular copy. Watermarks can also serve to allow for secured metering and support of 
other distribution systems of a given media content. In theory, almost any item of 
information could be encoded and used as a watermark. 

Digital watermarks are created by encoding a data signal, hereinafter referred to as 
the "watermark signal," "watermark data," or simply "watermark", which is then 
integrated into a larger content signal, hereinafter referred to as the "audio signal", to 
create a composite signal. Ideally, the composite signal should contain minimal or no 
perceptible artifacts of the watermark. 

It is known in the art that every audio signal generates a perceptual concealment 
function which masks audio distortions existing simultaneously with the signal. 
Accordingly, any distortion, or noise, introduced into the transmission channel if properly 
distributed or shaped, will be masked by the audio signal itself. Such masking may be 
partial or complete, leading either to increased quality compared to a system without noise 
shaping, or to near-perfect signal quality that is equivalent to a signal without noise. In 
either case, such "masking" occurs as a result of the inability of the human perceptual 
mechanism to distinguish between two signal components, one belonging to the audio signal 
and the other belonging to the noise, in the same spectral, temporal or spatial locality. An 
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important efiect of this limitation is that the perceptibility of the noise by a listener can be 
zero, even if the signal-to-noise ratio is at a measurable level. Ideally, the noise level at all 
points in the audio signal space is exactly at the level of just-noticeable distortion, which 
limit is typically referred to as the "perceptual entropy envelope" or "PEE". 

Hence, the main goal of noise shaping is to minimize the perceptibility of distortions 
by advantageously shaping it in time or frequency so that as many of its components £is 
possible are masked by the audio signal itself. See Nikil Jayemt et al., Signal Compression 
Based on Models of Human Perception . 81 Proc. of the IEEE 1385 (1993). 

"Perceptual coding" techniques employing the above-discussed principles are 
presently used in signal compression and are based on three t3^es of masking: frequency 
domain, time domain and noise level. The basic principle of frequency domain masking is 
that when certain strong signals are present in the audio band, other lower level signals, 
close in frequency to the stronger signals, are masked and not perceived by a listener. Time 
domain masking is based on the fact that certain types of noise and tones are not perceptible 
immediately before and after a larger signal transient. Noise masking takes advantage of 
the fact that a relatively high broadband noise level is not perceptible if it occurs 
simultaneously with various types of stronger signals. 

Perceptual coding forms the basis for precision audio sub-band coding (PASC), as 
well as other coding techniques used in compressing audio signals for mini-disc (MD) and 
digital compact cassette (DCC) formats. Specifically, such compression algorithms take 
advantage of the fact that certain signals in an audio channel will be masked by other 
stronger signals to remove those masked signals in order to be able to compress the 
remaining signal into a lower bit-rate channel. 

One of the deficiencies of conventional systems for adding a watermark to an audio 
signal is that the watermark is encoded on a single frequency band or channel, such that 
opportunities for inserting the watermark such that it is masked by the PEE of the audio 
signal are limited. In addition, there exists no option to provide redimdancy; that is, the 
entire watermark is included only once in the audio signal, such that if any part of it is 
damaged, it is difficult, if not impossible, to recover. Finally, there is no way to "force" an 
opportimity such that a minimum time between transmissions of the watermark data can be 
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enforced or to "create" an opportunity where one almost exists by changing the gain of the 
audio signal. 

Therefore, what is needed is an improved system for providing a digital watermark in 
an audio signal. 
5 Summarv of the Invention 

The foregoing problems are solved and a technical advance is achieved by a 
computer-implemented system for providing a digital watermark in an audio signal. In a 
preferred embodiment, a audio file, such as a .WAV file, containing an audio signal to be 
watermarked is processed by an encoder using an algorithm of the present invention herein 
10 referred to as the* "PAWS algorithm" to determine and log the location and number of 

opportunities that exist for inserting a watermark into the audio signal such that it will be 
masked by the PEE of the audio signal. The user can adjust certain parameters of the 
PAWS algorithm before the audio file is processed. A/B/X testing between the original and 
watermarked files is also supported to allow the user to undo or re-encode the watermark, if 
15 desired. 

In particular, the encoder divides the frequency spectrum into seven "critical bands", 
each of which includes two carrier frequencies for representing logic 0 and logic 1, 
respectively. The basic encoding process is as follows. First, the user sets up the desired 
parameters for the algorithm, including selecting which critical bands are to be active, 

20 specifying, in dB, the desired "headroom" between the PEE of the audio signal and the 

amplitude of the encoded watermark signal transmitted in each active band, and specifying 
the maximmn time between transmissions of the encoded watermark signal. 

If the encoding is not being performed in real-time, the user executes a 
preconditioning phase. During preconditioning, the encoder runs through the entire .WAV 

25 file and logs watermark opportunities according to the PAWS algorithm and the parameters 
specified by the user. In addition, the encoder detects "near-miss" opportunities in the 
audio signal; that is, points in the audio signal that would constitute opportunities with a 
small adjustment to the gain. The encoder adjusts the gain of the audio signal at that point 
to create an opportunity therefrom. The preconditioned audio signal is written back to a 

30 .WAV file. 
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In a preferred embodiment, the watermark is formatted as a fremie of 32 characters. 
During operation, the original or preconditioned ,WAV file is input to the encoder, which 
monitors each active critical band of the audio signal to detect opportunities for inserting 
watermark data in accordance with the PEE of the signal within the band, as well as the 
5 user-defined parameters. The existence and location of each opportunity is logged and the 
encoder determines how m£my b3rtes of the watermark word (a "subframe") may be 
tramsmitted during that opportunity, according to the data rate of that band, by measuring 
the width of an opportunity and dividing by the data rate, which yields the size of the data 
transmission. The encoder encodes the watermark using Gaussiem Minimal Shift Key 

10 ("GMSK") modulation and incorporates the encoded subframes of the watermark data block 
into the audio signal at the opportunity. 

In one aspect, at each opportunity, a timer is reset to a maximum time between 
opportunities, which is either a default value or a value selected by a user. If the timer 
times out before the next opportunity is detected, the encoder "forces" an opportunity by 

15 cross-fading in an 18 kHz low pass filter ("LPF") to clean out the band above 18 kHz, 

transmitting the watermark signal using OMSK modulation at carrier frequencies 18.5kHz 
(for logic 0) and 19.5 kHz (for logic 1) and a data rate of 1200 bps, and then cross-fading out 
the LPF. 

In the preferred embodiment, each portion of watermark data to be inserted at a 
20 given opportunity is preceded by a 4-bit preamble. In addition to the four preamble bits, 
additional bits must be allocated in each subframe to indicate which piece of the overall 
watermark the present burst carries. If the seven bands are used, there are a minimum of 
16 bits per transmission. Therefore, four more bits may be used to indicate which character 
the present character is and there are at least eight bits left over to carry actual watermark 
25 data. If a higher frequency band carries more than 16 bits, then the preamble indicates the 
index of the first character of the transmission. 

Alternatively, rather than using a 4-bit index preamble bit, one preamble could be 
assigned to indicate the start of a frame and another assigned to the rest of the frame, in 
which case 12 bits of each transmission would be left for carrying data. 
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In any event, each subframe of watermark data is modulated using GMSK 
modulation centered at the geometric mean of the two carrier frequencies within the band 
£ind mixed with the audio signal at a level defined by the user ("headroom"). The resultant 
watermarked audio signal is stored in a file in memory. 
5 Information concerning the total number of opportunities and the average and 

maximum time between them is made available to the user so that he or she can determine 
how well the current settings for the algorithm parameters performed. At this point, the 
user may wish to change some of the parameters, for example, if the average time between 
transmissions is too great or the total number of opportunities is too small. 

10 Once the audio file has been processed, the user can audition the original .WAV file 

against the watermarked audio file. A conventional .WAV viewer window is provided for 
this purpose, with controls for advancing to the next or previous watermark position and for 
auditioning the original ("A"), watermarked ("B"), or unknown random ("X") version, which 
allows a user to listen to the original or watermarked version without knowing which 

15 version they are listening to, thereby eliminating Einy personal bias that might affect the 

user's perception of the watermark. During the auditioning phase, the user may amplify or 
attenuate the level of each watermark instance via a level control with a range of +A 20dB, 
This level wiU be applied to that instance of the watermark during the next run of the 
encoder. 

20 Once the user has auditioned the watermarked file, the file can be saved in any one of 

a nimiber of known formats. The encoding process is now complete. 

On the decoding end, a decoder decodes the watermark from the watermarked signal 
iising GMSK demodulation. The result of the GMSK demodulation is, for each band, a 
"random" stream of O's and I's. 

25 The watermark signal is detected from the data stream output each of the GMSK 

demodulators as follows. First, the data stream is sampled at a particular sample rate "Fs". 
If the baud rate ("Fb") is related to the sample rate by a knovra ratio ("R"), e.g., R=Fs/Fb, 
then the output from the GMSK demodulator can be routed through a sliding window of 
width R and observed to detect all I's or all O's, indicating what appears to be a valid bit.. 
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Using four of these sliding comparators, the full preamble can be detected, thus indicating 
the start of a transmission of a watermark sub frame. 

Once a preamble has been detected, a comparator of width R is used to detect each 
bit of the subframe. If anything but all O's or I's is detected in each bit cell, the whole 
5 subframe is discarded, since there was either a faulty preamble detection (e.g., it was really 
audio information that looked like the preamble) or the signal was negatively impacted by 
noise during transmission. If R-1 or R+ 1 O's or I's are detected, the sample rate might be off 
by a fraction, so the discrepancy is ignored and the bit counter is reset upon the next state 
change. 

10 In one embodiment, the entire watermark is sent once, with the various subframes 

transmitted in the various active critical bands, such that a portion of the watermark may 
be sent in each of the active bands, thereby increasing the number of opportunities for 
inserting the watermark. In another embodiment, the entire watermark is inserted in each 
of the bands, such that the watermark appears seven times in the watermarked audio signal 
15 (assuming all of the bands are designated as active), thereby providing redundancy. 

A technical advantage achieved with the invention is that it is capable of "forcing" an 
opportunity if no opportunities have been detected for a predefined period of time, thereby 
to ensure that all of the watermark data is transmitted. 

A further technical advantage achieved with the invention is that it operates in seven 
20 critical bands, thereby providing increased opportunities for including the watermark data 
and the option for redundancy, where desirable. 

Another technical advantage achieved with the invention is that the audio signal can 
be preconditioned such that if a "near-opportunity" is detected, a filter can be used to 
change the frequency response of the system to create an opportunity. 
25 Brief Description of the Drawings 

Fig. 1 is a block diagram of the system of the present invention for inserting a digital 
watermark in an audio signal. 

Fig. lA is a block diagram of an encoding portion of the system of Fig. 1. 
Fig. 2 illustrates an exemplary user interface screen of the system of the present 
30 invention. 
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Fig. 3 is a block diagram of a preconditioner of the encoding portion of the encoder 
of Fig. lA. 

Fig, 4 is a block diagram of an encoder of the encoding portion of the encoder of Fig. 

lA. 

5 Fig 5 is a flowchart of the operation of the encoding portion of Fig. LA. 

Fig. 6 is a block diagram of a decoding portion of the system of Fig. 1. 
Detailed Description of a Preferred Embodiment 

As previously indicated, in accordance with features of th^ present invention, the 
frequency spectrum is divided into seven "critical bands," as shown below in Table I. Each 
10 of these bands includes two carrier frequencies for representing logic 0 and logic 1, 

respectively. The data rate of each band, in bits per second ("bps") varies and is specified by 
the entry for the band in the column designated "Data Rate (bps)". For example, band #1 is 
defined as the range of frequencies from 1,281 Hz to 1,721 Hz. Logic 0 and logic 1 are 
represented within band #1 by 1,387 Hz and 1,607 Hz, respectively. The data rate for band 
15 #1 is 320 bps. 



Band# 


Lower 
Band Edge 
(Hz) 


Upper 
Band Edge 
(Hz) 


Logic 0 
Carrier Freq. 
(Hz) 


Logic 1 
Carrier Freq. 
(Hz) 


Data Rate 
(bps) 


1 


1,281 


1,721 


1,387 


1,607 


320 


2 


1,721 


2,323 


1,856 


2,157 


320 


3 


2,323 


3,212 


2,525 


2,970 


640 


4 


3,212 


4,439 


3,500 


4,114 


640 


5 


4,439 


6,387 


4,880 


5,854 


1280 


6 


6,387 


9,401 


7,013 


8,521 


1280 


7 


9,401 


15,502 


10,543 


13,593 


2560 



Table I 
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It should be noted that data rates have been chosen that are related by powers of 
two, allowing the decoder to derive a master bit clock ("MBC") from any (or all) bands and 
utilize the MBC for all bands. 

Fig. 1 illustrates a system 10 embodying features of the present invention. In a 
5 preferred embodiment, the system 10 is implemented using a conventional computer 12 

having a display 14, an audio input device, such as a microphone, 16, one or more user input 
devices, such as a keyboard and/or a mouse, collectively designated by a reference numeral 
17, and an audio output device, such as a speaker, 18. As illustrated in Fig. 1, and as will be 
described in greater detail below, the system 10 includes an encoding portion 20 and a 
10 decoding portion 22. It will be recognized that a single computer, such as the computer 12, 
may be used to implement one or both of the encoding and decoding portions 20, 22. 

Referring to Fig. lA, the encoding portion 20 of the system of the present invention 
comprises an encoder 102 that implements a PAWS algorithm 104 and a memory device 106 
connected to the encoder 102. The memory device 106 is used to store various files for use 
15 in connection with the present invention, including an original audio file, such as a .WAV 
file, 108 containing the original audio data to be watermarked and a watermark file 110 
containing the watermark data. Also stored in the memory device 106 are a user parameters 
file 112 for storing user parameters specified using a user interface screen, such as a screen 
200 shown in Fig. 2. 

20 Referring to Fig. 2, in accordance with a feature of the present invention, the user is 

prompted to specify certain parameters for use in controlling certain aspects of the operation 
of the encoding portion 20. In particular, using the screen 200, the user can specify, in dB, 
in a "Headroom" field 202, the desired headroom between the PEE of the audio signal and 
the ampUtude of the encoded watermark signal. In addition, the user can designate as 

25 active one or more of the seven critical bands by checking a checkbox 204 associated with 
each band selected to be active. The default state for each critical band is active, since this 
allows the most opportunities to encode the watermark signal. Although not shown in Fig. 
2, it should be noted that headroom can be designated for each of the active critical bands 
individually as well. Finally, the user can specify, in seconds, the maximxun time that 

30 should be allowed to elapse between transmissions of watermark data with an entry in a 
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"Max. Time Between Transmissions" field 206. The default value for this parameter is 3 
seconds, the goal being to transmit 16 bytes of encoded watermark data every three seconds. 
Once the desired parameters have been specified, the user clicks on or otherwise selects an 
"OK" button 208 to store the parameters in the user parameters file 112 (Fig. lA). 
5 Referring again to Fig. lA, after the user enters the parameters, as described with 

reference to Fig. 2, if the audio data is to be watermarked other than in real-time, the user 
enters a preconditioning phase, in which the audio signal stored in the original audio file 108 
is input to a preconditioner 114. As will be described in greater detail with reference to Fig. 
3, the preconditioner 114 preconditions the audio signal by detecting near-miss 

10 opportunities and then adjusting the gain of the audio signal to create useable opportunities 
from such near-misses. Once the audio signal is preconditioned, it is stored in a 
preconditioned audio file 116 in the memory device 106. 

In particular, referring to Fig. 3, the preconditioner 114 comprises a number of EPFs 
300, each of which is designed to pass one of the critical bands designated above in Table I. 

15 The output of each of the EPFs 300 is input to a respective near miss detector ("NMD") 302, 
which detects near-miss opportunities in the audio signal in the respective critical band. In 
particular, each NMD 302 determines how close we came to an opportunity. For example, if 
the encoder (Fig. 4) requires the audio signal level not to remain below a certain threshold 
for a certain duration and the audio signal level actually goes above that threshold by 3dB 

20 for 5ms, the NMD 302 will record the fact that during that period of time the signal energy 
in the respective critical band was 3dB too high for an opportunity to occur. 

The output of each NMD 302 is a control signal to a respective band reject filter 
("ERF") 304 that adjusts how much the ERF attenuates the critical band. In the above 
example, the control signal would cause the ERF 304 to attenuate the band by 3dB to force 

25 the opportunity at that point in time. The default gain of all of the ERFs 304 is OdE; 

therefore, their sum is the same as the input signal (no change). Whenever any one of the 
ERFs 304 attenuates a band the resulting signal is modified so that when it is run through 
the encoder the opportimities wiU actually occur. Each ERF 304 is configured similarly to a 
parametric equalizer, which is known by those skilled in the art to be a common audio 

30 processing device used in audio systems. 
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The signals output from each of the BRFs 304 are input to a summer 306, which 
reconstructs the audio signal and outputs it to the preconditioned audio file 116 (Fig. lA), 
Referring again to Fig. lA, afler preconditioning (for non-real-time applications) or 
irrespective of preconditioning (for real-time applications), the audio signal is input to the 
5 encoder 102 from the preconditioned audio file 116, if one exists, or from the original audio 
file 108, if the audio signal has not been preconditioned, along with the watermark data 
stored in the watermark data file 110 and the user parameters stored in the user parameters 
file 112. It will be recognized that Fig. lA illustrates a non-real-time application, where 
preconditioning does take place. 

10 Referring now to Figs. 1 and 4, the operation of the encoder 102 will be described in 

greater detail. Initially, an audio signal from the original audio file 108 (for real-time 
applications) or from the preconditioned audio file 116 (for non-real-time applications) is 
simultaneously filtered by seven EPFs 400 each tuned to one of the critical bands defined in 
Table I. The output of each of the EPFs 400 is input to a detector 402, which monitors the 

15 respective critical band for opportunities to insert watermark data into the audio signal 

according to the PAWS algorithm 104 and the parameters specified by the user. When such 
an opportimity is detected, the detector 402 outputs an enable signal to a respective 
modulator 406, implemented for each critical band as an FSK modulator tuned to the 
geometric mean of the two csirrier frequencies of the band. The output of each of the 

20 modulators 406 is input to a summer 408 along with the audio signal output from the 
original audio file 108, resulting in a watermarked audio signal being output from the 
summer. 

In a preferred embodiment, each time an opportunity is detected by one of the 
detectors 402, a timer 410 is reset to the value specified by the user in the Maximum Time 

25 Eetween Opportunities field 206 (Fig. 2), or to the default value, if the user did not specify a 
value. When the timer 410 times out, it enables a GMSK modulator 412, the input to which 
is the watermark data from the watermark data file 110, causing it to modulate the 
watermark data, which modulated watermark data is output to a second summer 414 where 
it is mixed with the output of an 18 kHz low pass filter ("LPF") 416, the input to which is 

30 the audio data from the original audio file 108. The output of the summer 414 is input to a 
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two input multiplexer ("MUX") 420. the other input of which is tied to the output of the 
first summer 408. The output of the timer 410 is tied to the select input of the MUX 420 
such that, when the timer times out, the output of the second summer 414 is output from 
the MUX 420 as the watermarked audio signal. As a result, whenever a specified maximum 
5 amount of time elapses between opportunities, an opportunity is "forced" by cross-fading in 
the LPF 416 to clean out the band above 18 kHz, GMSK modulating the watermark data at 
carrier frequencies 18.5kHz (for logic 0) and 19.5 kHz (for logic 1) and a data rate of 1200 
bps, and then cross-fading out the LPF. 

The output of the MUX 420 is stored in a temporary audio file 118 in the memory 

10 device 106. At this point, the user can audition the original audio file 108 against the 

watermarked audio signal stored in the temporary audio file 118. A conventional .WAV 
viewer window (not shown) is displayed on the display 14 (Fig. 1) and has controls for 
advancing to the next or previous watermark position and for auditioning the original ("A"), 
watermarked ("B"), or unknown random ("X") version, which allows a user to listen to the 

15 original or watermarked version without knowing which version they are listening too, 
thereby eliminating any personal bias that might affect the user's perception of the 
watermark. During the auditioning phase, the user may amplify or attenuate the level of 
each watermark instance via a level control with a range of +/- 20dB. This level will be 
applied to that instance of the watermark during the next run of the encoder. 

20 In addition, information concerning the total number of opportunities and the 

average and maximum time between them is stored in a statistics file 122 and can be 
displayed to the user on the display 14 (Fig. 1) so that he or she can determine how well the 
current settings for the algorithm parameters performed. At this point, the user may wish 
to change some of the parameters, for example, if the average time between transmissions is 

25 too great or the total number of opportunities is too small. 

Once the user has auditioned the temporary file 118, the file can be saved in any one 
of a nximber of known formats. 

In one embodiment, the PEE is defined as an exponential decay that begins when a 
burst of energy in a band is followed by at least 3 dB less energy in that band for 10 ms or 

30 more. The value of the exponential at any time specifies the maximum level that the GMSK 
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signal may be transmitted at that time. The user control for "headroom" defines a further 
attenuation under this exponential. 

Fig. 5 is a flowchart of the operation of the encoder 102 (Fig. lA). It should be 
recognized that the algorithm described with reference to Fig. 5 is implemented for each of 
5 the critical bands designated as active by the user as described above with reference to Fig. 
2. Moreover, in connection with Fig. 5, "audio signal" shall be deemed to refer to either the 
signal stored in the original audio file 108, of no preconditioning has been performed, or the 
preconditioned audio signal stored in the preconditioned audio file 116, if preconditioning 
has been performed. In step 500, a determination is made whether the maximum time 

10 between transmissions, as specified by the user as described above with reference to Fig. 2, 
has elapsed. This step is performed by determining whether the timer 410 (Fig. 4) has timed 
out. If not, execution proceeds to step 502, in which a determination is made whether a data 
burst, or opportunity, has been detected in the audio signal. If not, execution returns to 
step 500; otherwise, execution proceeds to step 504. 

15 In step 504, a determination is made whether the data burst is followed by at least 3 

dB less energy in the band for at least 10 ms. If not, execution returns to step 500; 
otherwise, execution proceeds to step 506, in which the opportunity is logged, and then to 
step 508, in which a watermark data subframe is generated 

In a preferred embodiment, the watermark is formatted as a frame of 32 characters. 

20 Each portion of watermark data to be inserted at a given opportunity ("subframe") is 

preceded by a 4-bit preamble. In addition to the foxir preamble bits, additional bits must be 
allocated in each subframe to indicate which piece of the overall watermark the present 
burst carries. If the seven bands are used, there are a minimum of 16 bits per transmission. 
Therefore, four more bits may be used to indicate which character the present character is 

25 and there are at least eight bits left over to carry actual watermark data. If a higher 

frequency band carries more than 16 bits, then the preamble indicates the index of the first 
character of the transmission. 

Alternatively, rather than using a 4-bit index preamble bit, one preamble could be 
assigned to indicate the start of a frame and another assigned to the rest of the frame, in 

30 which case 12 bits of each transmission would be left for carrying data. 
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Referring again to Fig. 5, in step 510, the watermark data subframe is combined with 
the audio signal at level as defined by the user as described above with reference to Fig. 2. 
In step 512, a determination is made whether the entire watermark frame has been sent. If 
not, execution proceeds to step 513, in which the timer 410 is reset, and then returns to step 
5 500; otherwise, execution proceeds to step 514 and the watermarked audio file is saved as a 
temporary file. 

Referring again to step 500, if a determination is made that the maximum time 
between transmissions has elapsed, execution proceeds to step 516, in which an opportunity 
is forced, as described with reference to Fig. 3. Upon completion of step 516, execution 

10 proceeds to step 506. 

As previously indicated, the temporary file generated as a result of the encoding 
described with reference to Fig. 5 may be auditioned and the parameters therefore changed 
prior to the watermarked signal being saved as a permanent file. 

As previously indicated, in one embodiment, the entire watermark is sent once, with 

15 the various subfirames transmitted in the various active critical bands, such that a portion of 
the watermark may be sent in each of the active bands, thereby increasing the number of 
opportunities for inserting the watermark. In another embodiment, all of the watermark 
data is inserted in each of the bands, such that the watermark appears seven times in the 
watermarked audio signal (assuming all of the bands are designated as active), thereby 

20 providing redundancy. 

Fig. 6 illustrates, in greater detail, the decoder portion 22 of Fig. lA. 
Initially, a watermarked audio signal is input to the decoder portion 22 from either an audio 
file or via the audio input device 16 (Fig. 1). The watermarked audio signal is 
simultaneously filtered by seven EPFs 601 each tuned to one of the critical bands defined in 

25 Table 1. In a preferred embodiment, each of the EPF 601 have a Gaussian-shaped band pass 
response. The output of each of the EPFs 601 is input to a respective FSK demodxilator 602, 
each of which is implemented as a phase-locked loop ("PLL") tuned to the geometric mean 
of the two carrier frequencies of the respective critical band. The result of each EPF 
601/demodulator 602 pair is to GMSK demodulate the watermarked audio signal in the 

30 respective critical band. The output of each of the demodulators 602 is input to a respective 
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data detector 604 which detects the watermark data. In a preferred embodiment, each of 
the data detectors 604 is implemented as described below. The output of the data detectors 
604, comprising the watermark data, are stored in a file in memory 106. 

As previously indicated, in the preferred embodiment, a four-bit preamble is used to 
5 indicate the start of watermark data. The pattern of the preamble is largely arbitrary, but 
should be selected to be something that is not likely to occur during idle conditions (i.e., not 
all O's or all I's). For the sake of example, the pattern 1010 (hex A) has been chosen. The 
decoding portion 22 performs FSK demodulation, using the demodulators 602, on the 
incoming watermarked audio signal and the output of each is a stream of O's and I's. From 

10 this, the watermark data will be detected. 

To do this, the stream of O's and I's is sampled at a particular sample rate "Fs". If 
the baud rate ("Fb") is related to the sample rate by a known ratio ("R"), e.g., R=Fs/Fb, 
then the output from each FSK demodulator 602 can be routed through a detector 604 
comprising a sliding window of width R, which watches for all I's or all O's, indicating what 

15 appears to be a valid bit. Using four of these sliding comparators in each detector 604. the 
full preamble can be detected, thus indicating the start of a transmission of the watermark. 
This is a more efficient way to detect the preamble than using a 4 x R -wide detector. 
Exemplary values for Fs, Fb, and R, are 44.1 kHz, 630 bps, and 70, respectively. 

Once the preamble has been detected, a comparator of width R is used to detect each 

20 bit of the data frame. If anything but all O's or I's is detected in each bit cell, the whole 
thing is thrown out, since it is either a faulty preamble detection (e.g., it was really audio 
information that looked like the preamble) or the signal was interfered with by noise during 
transmission. If R-1 or R+1 O's or I's are detected, the sample rate might be off by a 
fraction, so the discrepancy is ignored and the bit coxmter is reset upon the next state 

25 change. 

In one embodiment, the invention described herein is implemented as a DirectX® 
plug-in to take advantage of the non-real-time capabilities of personal computer-based 
software, such as Cakewalk® and Sound Forge®, DirectX®, Cakewalk®, and Soimd 
Forge® are registered trademarks of Microsoft Corporation, of Redmond, Washington, 
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Twelve Tone Systems, of Watertown, Massachusetts, and Sonic Foundry, Inc., of Madison, 
Wisconsin, respectively. 

Although illustrative embodiments of the invention have been shown and described, 
a wide range of modification, change, and substitution is intended in the foregoing 
5 disclosure and in some instances, some features of the present invention may be employed 
without a corresponding use of the other features. Accordingly, it is appropriate that the 
appended claims be construed broadly and in a manner consistent with the scope of the 
invention. 
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WHAT IS CLAIMED IS: 

1. A method of providing a digital watermark in an audio signal, the method 
comprising: 

dividing a frequency spectrum of the audio signal into a plurality of critical bands; 
5 specifying as active at least one of the criticsd bands; 

monitoring the audio signal in each of the active critical bands to detect 
opportunities for inserting watermark data; 

responsive to detection of each opportunity: 
logging the opportunity; 
10 encoding a portion of the watermsu-k data; and 

adding the encoded portion of the watermark data to the audio signal at each 
of the detected opportunities to create a watermarked audio signal; and 
storing the watermarked audio signal in a second audio file. 

15 2. The method of claim 1 further comprising: 

determining whether the method is being performed in real-time; and 

if the method is not being performed in real-time, preconditioning the audio signal. 

3. The method of claim 2 wherein the preconditioning comprises: 
20 detecting a neeir-miss opportunity in the audio signal; and 

creating an opportunity from the near-miss opportunity. 

4. The method of claim 1 wherein each of the critical bands includes first and 
second carrier frequencies for representing logic 1 and logic 0, respectively, within the 

25 critical band and wherein a data rate is specified for each of the critical bands such that aU 
of the data rates are related by a power of two. 

5. The method of claim 1 further comprising: 
auditioning the second audio file; and 

30 comparing the second audio file with the first audio file. 
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6. The method of claim 1 wherein the encoding comprises modulating the 
watermark data within the critical band using Gaussian Minimal Shift Key ("GMSK") 
modulation. 

7. The method of claim 1 wherein the plurality of critical bands comprise seven 
critical bands. 

8. The method of claim 1 further comprising receiving user-specified parameters 
for detecting opportunities and encoding the watermark data. 

9. The method of claim 8 wherein the user-specified parameters comprise a 
headroom parameter for defining a level of transmission of the encoded portion of the 
watermark data relative to the audio signal. 

10. The method of claim 8 wherein the user-specified parameters comprise a 
maximiun time between transmissions parameter for defining the maximum time that 
should be allowed to elapse between opportunities for adding the encoded watermeirk data in 
the audio signal. 

11. The method of claim 8 wherein the adding further comprises adding the 
encoded portion of the watermark data to the audio signal at a level specified by the user. 

12. The method of claim 1 wherein the specifying as active at least one of the 
critical bands is performed by the user. 

13 The method of claim 1 wherein the detecting opportunities further comprises, 
for each active critical band, monitoring the critical band for a data burst followed by a 
period of no energy. 
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14. The method of claim 1 wherein the opportunities are defined by a perceptual 
entropy envelope of the audio signal within a critical band. 

15. The method of claim 1 further comprising: 

determining whether a maximum time between transmissions as specified by a user 
has elapsed since a last transmission; and 

if the specified maximum time between transmissions has elapsed, forcing an 
opportunity. 

16. The method of claim 15 wherein the forcing an opportunity comprises: 
cross-fading in a low pass filter ("LPF"); 

transmitting the watermark signal using GMSK at first and second carrier 
frequencies for representing logic 1 and logic 0, respectively; £md 
cross-fading out the LPF. 

17. The method of claim 16 wherein the LPF is an 18 kHz LPF and the first and 
second frequencies are 18.5 kHz and 19.5 kHz, respectively. 

18. The method of claim 1 further comprising: 

providing to a user an indication of the number of opportimities and a maximum and 
average time between opportunities. 

19. Apparatus for providing a digital watermark in an audio signal, the apparatus 
comprising: 

means for dividing a frequency spectrum of the audio signal into a plxirality of critical 

bands; 

means for specifying as active at least one of the critical bands; 

means for monitoring the audio signal in each of the active critical bands to detect 
opportimities for inserting watermark data; 

means responsive to detection of each opportunity for logging the opportunity, 
encoding a portion of the watermark data, and adding the encoded portion of the watermark 
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data to the audio signal at each of the detected opportunities to create a watermarked audio 
signal; and 

means for storing the watermarked audio signal in a second audio file. 

20. The apparatus of claim 19 further comprising: 

means for determining whether the method is being performed in real-time; and 
means for preconditioning the audio signal if the method is not being performed in 
real-time. 

21. The apparatus of claim 20 wherein the preconditioning comprises: 
detecting a near-miss opportunity in the audio signal; and 

creating an opportunity from the near-miss opportunity. 

22. The apparatus of claim 19 wherein each of the critical bands includes first and 
second carrier frequencies for representing logic 1 and logic 0, respectively, within the 
critical band and wherein a data rate is specified for each of the critical bands such that all 
of the data rates are related by a power of two. 

23. The apparatxis of claim 19 further comprising: 
means for auditioning the second audio file; and 

means for comparing the second audio file with the first audio file. 

24. The apparatus of claim 19 wherein the encoding comprises means for 
modulating the watermark data within the critical band using Gaussian Minimal Shift Key 
("GMSK") modulation. 

25. The apparatus of claim 19 wherein the plurality of critical bands comprise 
seven critical bands. 
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26. The apparatus of claim 19 further comprising means for receiving user- 
specified parameters for detecting opportunities and encoding the watermark data. 

27. The apparatus of claim 26 wherein the user-specified parameters comprise a 
headroom parameter for defining a level of transmission of the encoded portion of the 
watermark data relative to the audio signal. 

28. The apparatus of claim 26 wherein the user-specified parameters comprise a 
maximum time between transmissions parameter for defining the maximum time that 
should be allowed to elapse between opportunities for adding the encoded watermark data in 
the audio signal. 

29. The apparatus of claim 26 wherein the means for adding further comprises 
means for adding the encoded portion of the watermark data to the audio signal at a level 
specified by the user. 

30 The apparatus of claim 19 wherein means for the detecting opportunities 
further comprises, for each active critical band, means for monitoring the critical band for a 
data bxirst followed by a period of no energy. 

31. The apparatus of claim 19 wherein the opportunities are defined by a 
perceptusd entropy envelope of the audio signal within a critical band. 

32. The apparatus of claim 19 further comprising: 

means for determining whether a maximum time between transmissions as specified 
by a user has elapsed since a last transmission; and 

means for forcing an opportunity if the specified maximum time between 
transmissions has elapsed. 
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33. The apparatus of claim 32 wherein the means for forcing an opportunity 
comprises: 

means for cross-fading in a low pass filter ("LPF"); 

means for transmitting the watermark signal using GMSK at first and second carrier 
5 frequencies for representing logic 1 and logic 0, respectively; and 
means for cross-fading out the LPF. 

34. The appEiratus of claim 33 wherein the LPF is an 18 kHz LPF and the first 
and second frequencies are 18.5 kHz and 19.5 kHz, respectively. 

10 

35. The apparatus of claim 19 further comprising: 

means for providing to a user an indication of the number of opportunities and a 
maximum and average time between opportimities. 

15 36. A system for adding a digital watermark to an audio signal, the system 

comprising: 

an encoding portion; 

a memory device connected to the encoding portion; 
a user input device; and 
20 a user interface. 

37. The system of claim 36 wherein the encoding portion comprises an encoder for 
monitoring a critical band of the audio signal to detect an opportxmity to insert a watermark 
data such that it is masked by the audio signal, encoding the watermark data responsive to 

25 detection of the opportunity, and inserting the encoded watermark data in the audio signal 
at the opportunity. 

38. The system of claim 36 further comprising a preconditioner for 
preconditioning the audio signal to create an opportunity from a near-miss opportunity and 

30 storing the preconditioned audio signal in the memory device. 
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39. The system of claim 37 wherein the encoder further comprises: 

at least one band pass filter ("BPF") connected to receive the audio signal, the at 
least one BPF being tuned to the critical band; 

at least one detector having an input connected to receive the filtered audio signal 
output from the at least one BPF, the at least one detector for detecting an opportunity in 
the critical band of the filtered audio signal to insert watermark data therein such that the 
watermark data is masked by the audio signal, the detector outputting a control signal upon 
detection of an opportunity; 

at least one frequency shift key ("FSK") modulator connected to receive the control 
signal from the at least one detector, the at least one FSK modulator encoding the 
watermark data responsive to receipt of the control signal; and 

a first summer connected to receive the encoded watermark data and for adding the 
encoded watermark data to the audio signal. 

40. The system of claim 39 wherein the encoder further comprises: 
a low pass filter ("LPF") for low pass filtering the audio signal; 

a timer for generating a time out signal upon the elapse of a predetermined 
maximum time period since a previous opportunity has been detected; 

a second FSK modulator connected to receive the time out signal from the timer, the 
second FSK modulator encoding the watermark data responsive to receipt of the time out 
signal; and 

a second summer for adding the encoded watermark data from the second FSK 
modulator with the filtered audio signal output from the LPF filter. 

41. The system of claim 40 further comprising a multiplexer having inputs 
connected to receive the outputs of the first and second summers, respectively, and a select 
input connected to receive the time out signal, such that, upon receipt of the time out signal, 
the output from the second summer is output from the multiplexer; otherwise, the output 
from the first summer is output from the multiplexer. 
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42. The system of claim 40 wherein the LPF is tuned to ISkHz. 

43. The system of claim 39 further comprising: 

a plurality of EPFs each connected to receive the audio signal, each of the BPFs 
5 tuned to a respective critical band; 

a plurality of detectors each having an input connected to receive the filtered audio 
signed output from a respective one of the BPFs, wherein each of the detectors detect an 
opportunity in the respective critical band of the filtered audio signal to insert watermark 
data therein such that the watermark data is masked by the audio signal and outputting a 
10 control signal upon detection of an opportunity; 

a plurality of FSK modulators connected to receive the control signal from a 
respective one of the detectors, wherein each of the FSK modulators encode the watermark 
data responsive to receipt of the respective control signal; and 

wherein the first summer is connected to receive the encoded watermsirk data from 
15 each of the FSK modulators and to add the encoded watermark data to the audio signal. 

44. The system of claim 36 wherein the encoding portion further comprises a 
preconditioner. 

20 45. The system of claim 44 wherein the preconditioner comprises: 

at least one band pass filter ("EPF") connected to receive the audio signal, the at 
least one EPF being tuned to the critical band; 

at least one near miss detector ("NMD") having an input connected to receive the 
filtered audio signal output from the at least one EPF, the at least one NMD for detecting a 
25 near-miss opportunity in the critical band of the filtered audio signal to insert watermark 
data therein such that the watermark data is masked by the audio signal, the detector 
outputting a control signal upon detection of a near-miss opportunity; 

at least one band reject filter ("ERF") connected to receive the audio signal, the at 
least one ERF responsive to the control signal for adjusting a gain of the audio signal to 
30 create an opportunity in the critical band of the filtered audio signal; and 
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a summer connected to receive the adjusted audio signal output from the at least one 

BRF. 

46. The system of claim 45 wherein the preconditioner further comprises: 

a plurality of EPFs each connected to receive the audio signal, wherein each of the 
EPFs is tuned to one of a plurality of critical bands; 

a plurality of NMDs each having an input connected to receive the filtered audio 
signal output from a respective one of the EPFs, wherein each of the NMDs detect a near- 
miss opportunity in the critical band of the filtered audio signal to insert watermark data 
therein such that the watermeu-k data is masked by the audio signal and output a control 
signal upon detection of a near-miss opportunity; and 

a plurality of ERFs each connected to receive the audio signal, wherein each of the 
BRFs are responsive to the control signal for adjusting a gain of the audio signal to create an 
opportunity in the critical band of the filtered audio signal; 

wherein the stunmer is connected to receive the adjusted audio signal output from 
each of the BRFs and add the received signals together. 

47. The system of claim 36 further comprising a decoder including 

at least one band pass filter ("EPF") connected to receive a watermarked audio 
signal, the at least one EPF being tuned to the critical band; 

at least one frequency shift key ("FSK") demodulator connected to receive filtered 
watermarked audio signal output from the at least one EPF, the at least one FSK 
demodiJator demodulating the filtered watermarked audio signal input thereto; and 

at least one detector having an input connected to receive the demodulated 
watermarked audio signal output from the at least one FSK demodulator, the at least one 
detector for detecting watermark data from the demodulated watermarked audio signal; 

wherein the detected watermark data is stored in the memory device. 

48. The system of claim 47 wherein the decoder fiirther comprises 
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a plurality of EPFs each connected to receive a watermarked audio signal, each of the 
BPFs being tuned to one of a plurality of critical bands; 

a plurality of FSK demodulators each connected to receive the filtered watermarked 
audio signal output from a respective one of the BPFs for demodulating the filtered 
5 watermarked audio signal input thereto; and 

a plurality of detectors each having an input connected to receive the demodulated 
watermarked audio signal output from a respective one of the FSK demodulators for 
detecting watermark data from the demodulated watermarked audio signal; 

wherein the detected watermark data is stored in the memory device. 

10 

49. The system of claim 36 wherein the user interface device comprises a screen 
display presented on a display of a computer. 

50. A method of recovering a watermark from a watermarked audio signal, the 
15 method comprising: 

filtering the watermarked audio signal using at least one band pass filter ("EPF") 
tuned to a critical frequency band; 

demodulating the watermarked audio signal using Gaussian Minimal Shift Key 
("GMSK") modulation; and 
20 detecting watermark data from the demodulated watermarked audio signal. 

51. The method of claim 50 further comprising: 
storing the detected watermark data in a memory device. 

25 52. The method of claim 50 wherein the filtering further comprises filtering the 

watermarked signal using a plurality of BPFs each tuned to one of a plurality of bands. 

53. The method of claim 52 further comprising recovering a portion of the 
watermark data from each of the critical bands. 

30 
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54. The method of claim 52 further comprising recovering all of the watermark 
data from each of the critical bands. 

55. A decoder for recovering watermark data from a watermarked audio signal, 
5 the decoder comprising: 

at least one band pass filter ("BPF") connected to receive the watermarked audio 
signal, the at least one BPF being tuned to the critical band; 

at least one frequency shift key ("FSK") demodulator connected to receive filtered 
watermarked audio signal output from the at least one BPF, the at least one FSK 
10 demodulator demodulating the filtered watermarked audio signal input thereto; and 

at least one detector having an input connected to receive the demodulated 
watermarked audio signal output from the at least one FSK demodulator, the at least one 
detector for detecting watermark data from the demodulated watermarked audio signal. 

15 56. The system of claim 55 wherein the decoder further comprises 

a plurality of EPFs each connected to receive a watermarked audio signal, each of the 
EPFs being tuned to one of a plurality of critical bands; 

a plurality of FSK demodulators each connected to receive the filtered watermarked 
audio signal output from a respective one of the EPFs for demodulating the filtered 
20 watermarked audio signal input thereto; and 

a plurality of detectors each having an input connected to receive the demodulated 
watermarked audio signal output from a respective one of the FSK demodulators for 
detecting watermark data from the demodulated watermarked audio signal. 

25 57. The decoder of claim 55 further comprising a memory device connected to 

receive an output of the at least one detector. 

58. The decoder of claim 56 further comprising a memory device connected to 
receive an output of each of the detectors. 

30 
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59. The decoder of claim 57 wherein a single instance of the watermark data is 
stored in the memory device. 

60. The decoder of cledm 58 wherein multiple instances of the watermark data is 
5 stored in the memory device. 
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